[pull] main from triggerdotdev:main#209
Merged
Merged
Conversation
## Summary Adds a second backend for the realtime runs feed (`useRealtimeRun`, `subscribeToRunsWithTag`, `subscribeToBatch`), built to stay healthy when a single busy environment has many subscribers watching many runs at once. It is gated behind a feature flag with the existing backend as the default, so nothing changes for users until it is enabled per environment. ## Design A run change is published once, as a small self-describing record, to a single per-environment channel. Every feed is then a predicate over that one stream rather than owning a channel: - A per-instance router indexes the currently-held feeds by run, tag, and batch. When a run changes it hydrates the affected rows once and serializes them once, then fans the result to every matching feed. One hot shared tag watched by many subscribers costs a single database query and serialize, not one per subscriber. - Feeds that don't match a change are never woken, wake delivery per environment is coalesced on a leading edge (250ms default) so a burst of changes costs one wake, and cold reads coalesce onto a single short-TTL-cached resolve. - An admission gate bounds how many cold ClickHouse resolves run concurrently, so a mass reconnect across many distinct filters queues instead of stampeding the database. - Changes that land while a client is between long-polls are delivered on its next poll instead of waiting for the periodic backstop: each environment buffers its recent change records, subscriptions linger briefly after the last feed closes, and a newly-armed poll replays exactly the connection's gap. - The per-connection replay cursors behind that are shared across instances via Redis (a single timestamp each), so a poll landing on a different instance behind the load balancer still reads the connection's true gap instead of falling back to a cold resolve. Cursor reads have a bounded deadline and degrade to the cold-read path on any Redis trouble. - Tag subscriptions with multiple tags match runs carrying all of the tags, mirroring the existing backend's filter semantics, and live long-polls hold for about 20 seconds to match its cadence. - The per-environment channel supports Redis Cluster sharded pub/sub, so the wake path scales horizontally across shards by environment. - The backend reports its health through OpenTelemetry metrics (delivery lag, poll resolution paths, backstop outcomes, replay and cursor-store activity), with a provisioned Grafana dashboard for local development. Everything is behind the feature flag and tunable via env vars; the existing backend remains the default.
## Summary Reliability and authorization fixes for realtime chat sessions: - Session-stream waitpoint delivery is scoped to the environment, so two environments using the same session `externalId` can no longer complete each other's waitpoints. - The session snapshot-url routes now enforce per-session authorization, and appending to a session's `out` channel requires secret-key auth, so a session-scoped token can't read another session's snapshot or forge assistant output. - Appends that carry an `X-Part-Id` header are deduplicated on retry, so a retried send can't duplicate a message. - Session creation rejects expired sessions (instead of triggering a run that can never receive input), `externalId` is immutable after creation, and the sessions list endpoint returns friendly `run_*` ids to match the single-session routes. ## Rollout The waitpoint cache key gains an environment prefix. To keep waitpoints registered by the previous deploy working across the boundary, the drain reads both the new and the previous key for this release; the legacy read can be removed a release later once no pre-deploy waitpoints remain.
…3891) ## Summary A batch of reliability fixes for `chat.agent`: - A user message sent while the agent is streaming is no longer delivered twice (which could run a duplicate turn). - Input appends carry an idempotency key (`X-Part-Id`) so a retried send can't duplicate a message. - `onTurnComplete` now fires on errored turns with the thrown error attached, and the failed turn's user message is persisted so it isn't lost on the next run. - Stopping a generation clears the streaming state, so a page reload doesn't replay the stopped turn. - Custom agents and manual `chat.writeTurnComplete` callers trim the output stream, sending a custom action no longer leaves a second stream reader running, a long-lived `watch` subscription no longer grows its dedupe set without bound, promoting a queued message to steering no longer risks a double-send, and runs keep the full set of dashboard tags. The `X-Part-Id` header is accepted by current servers (they just don't dedupe on it yet), so this is safe to ship ahead of the matching server change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )
This change is